Syntactic Approximation Using Iterative Lexical Analysis

نویسندگان

  • Anthony Cox
  • Charles L. A. Clarke
چکیده

Syntactic irregularities, which often occur in sourcecode undergoing maintenance, prevent the application of analysis and comprehension tools that employ traditional parsing techniques. As an alternative to parsing, we have developed an iterative lexical technique that is based on the repetitive application of regular expressions using a shortest-match strategy. The approach recognizes syntactic elements using iterative refinement, where unambiguous constructs are identified to provide contextual cues for the identification of more ambiguous constructs. The use of a shortest-match strategy supports the bottom up construction of a syntax tree by identifying smaller subtrees first. To examine the technique’s effectiveness, we present the results of an experiment comparing iterative lexical analysis against parsing. The measures of precision and recall are used to evaluate and compare the two approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relationship between Syntactic and Lexical Complexity in Speech Monologues of EFL Learners

: This study aims to explore the relationship between syntactic and lexical complexity and also the relationship between different aspects of lexical complexity. To this end, speech monologs of 35 Iranian high-intermediate learners of English on three different tasks (i.e. argumentation, description, and narration) were analyzed for correlations between one measure of sy...

متن کامل

The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension

The present study investigated the effect of different types of text simplification (i.e., reducing the lexical and syntactic complexity of texts) on reading comprehension of English as a Foreign Language learners (EFL). Sixty female intermediate EFL learners from three intact classes in Tabarestan Language Institute in Tehran participated in the study. The intact classes were assigned to three...

متن کامل

A Contrastive Analysis of Sports Headlines in Two English Newspapers

It holds true that a flourishing fieldof Contrastive Rhetoric (CR) research has begun to address theway various text types and/or genres may differ across culturesand languages (Corner, 1996).  Very much in line withthis development, this study was an attempt to characterizethe linguistic structures of headlines in the sports section of 2 English newspapers: one non-Iranian (The Times) and one ...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Hybrid Syntactic Category Induction

Much research has been devoted to the task of learning lexical classes from unannotated input text. Among the chief difficulties facing any approach to the unsupervised induction of lexical classes are that of token-level ambiguity and the classification of rare and unknown words. Following the work of previous authors, the initial stage of syntactic category induction is treated in the current...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003